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Abstract. We discuss the computational complexity of random 2D Ising 
spin glasses, which represent an interesting class of constraint satisfac- 
tion problems for black box optimization. Two extremal cases are con- 
sidered: (1) the ±J spin glass, and (2) the Gaussian spin glass. We also 
study a smooth transition between these two extremal cases. The com- 
putational complexity of all studied spin glass systems is found to be 
dominated by rare events of extremely hard spin glass samples. We show 
that complexity of all studied spin glass systems is closely related to 
Frechet extremal value distribution. In a hybrid algorithm that com- 
bines the hierarchical Bayesian optimization algorithm (hBOA) with a 
deterministic bit-flip hill climber, the number of steps performed by both 
the global searcher (hBOA) and the local searcher follow Frechet distri- 
butions. Nonetheless, unlike in methods based purely on local search, 
the parameters of these distributions confirm good scalability of hBOA 
with local search. We further argue that standard performance measures 
for optimization algorithms — such as the average number of evaluations 
until convergence — can be misleading. Finally, our results indicate that 
for highly multimodal constraint satisfaction problems, such as Ising spin 
glasses, recombination-based search can provide qualitatively better re- 
sults than mutation-based search. 



1 Introduction 



The spin glass problem is an old-standing but still intensively studied problem 
in physics pp. First, experimental realizations of spin glass systems do exist 
and their properties, in particular their dynamics, are still not well explained. 
Second, spin glasses pose a challenging, unsolved problem in theoretical physics 
since the nature of the spin glass state at low temperatures is not understood. It 
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is widely believed that this is due to the intrinsic complexity of the rough energy 
landscape of spin glasses. 

In statistical physics, one usual goal is to calculate a desired quantity (e.g. 
magnetization) over a distribution of configurations of a spin glass system for a 
given temperature. The probability of observing a specific spin configuration, C, 
of the spin glass is governed by the Boltzmann distribution, that is to say it is 
inversely proportional to the exponential of the ratio of its energy and tempera- 
ture : p{C) ^ exp(— £'(C)/r). Thus, as temperature decreases, the distribution 
of possible configurations of the spin glass concentrates near the configurations 
with minimum energy, which are also called ground states. The ground-state 
properties capture most of the low temperatures physics, and it is therefore very 
interesting to find and study them. 

From another perspective, spin glasses represent an interesting class of prob- 
lems for black-box optimization where the task is to find ground states of a given 
spin glass sample, because the energy landscape in most spin glasses exhibits fea- 
tures that make it a challenging optimization benchmark. One of these features 
is the large number of local optima, which often grows exponentially with the 
number of decision variables (spins) in the problem. Because of the large number 
of local optima, using local search operators, such as mutation, is almost always 
intractable. 

In this paper we present, analyze, and discuss a series of experiments on 
2D Ising spin glasses. Random spin glass instances for a fixed lattice geome- 
try (square lattice) are generated by randomly sampling a fixed distribution of 
coupling constants. We distinguish two basic classes of random 2D Ising spin 
glass systems: (1) coupling constants are initialized randomly to either -1-1 or 
— 1, and (2) coupling constants are generated from a zero- mean Gaussian distri- 
bution. A transition between these two cases is also considered. We apply the 
hierarchical Bayesian optimization algorithm (hBOA) with local search to all 
considered classes of spin glasses, and provide a thorough statistical analysis of 
hBOA performance on a large number of problem instances in each class. The 
results are discussed in the context of state-of-the-art Monte Carlo methods, 
such as the Wang-Landau algorithm [2j and the multicanonical method [Sj. Fi- 
nally, we identify important lessons from this work for genetic and evolutionary 
computation. 

In the following we present a short review of the hierarchical Bayesian op- 
timization algorithm and extremal value distributions used in the statistical 
analysis. In section |21 we define the 2D Ising spin glass systems analyzed in this 
work, and introduce several classes of random spin glass instances. Section 0] 
presents experimental methodology and results. Finally, Section |H1 summarizes 
and concludes the paper. 
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2 Numerical methods and statistical analysis 

This section briefly discusses the hierarchical Bayesian optimization algorithm 
(hBOA) ^4.5j and extremal value distributions, which will be used to analyze 
experimental results. 

2.1 Hierarchical Bayesian optimization algorithm (hBOA) 

The hierarchical Bayesian optimization algorithm (hBOA) (4i5) is one of the 
most advanced genetic and evolutionary algorithms based primarily on selec- 
tion and recombination. hBOA evolves a population of candidate solutions to a 
given problem. Using a population of solutions as opposed to a single solution 
has several advantages; for example, it enables simultaneous exploration of mul- 
tiple regions in the search space, it can help to alleviate the effects of noise in 
evaluation, and it allows the use of statistical and learning techniques to identify 
regularities in the black-box optimization problem under consideration. 

The first population of candidate solutions is usually generated according to 
uniform distribution over all candidate solutions. The population is updated for 
a number of iterations using two basic operators: (1) selection, and (2) varia- 
tion. The selection operator selects better solutions at the expense of the worse 
ones from the current population, yielding a population of promising candi- 
dates. The variation operator starts by learning a probabilistic model of the 
selected solutions that encodes features of these promising solutions and the in- 
herent regularities. hBOA uses Bayesian networks with local structures [HI to 
model promising solutions. The variation operator then proceeds by sampling 
the probabilistic model to generate new solutions. The new solutions are incorpo- 
rated into the original population using the restricted tournament replacement 
(RTR) |7], which ensures that useful diversity in the population is maintained 
over long periods of time. A more detailed description of hBOA can be found 
in 0. 

To improve candidate solutions locally, hBOA applies a deterministic bit- 
flip hill-climber to each newly generated candidate solution that improves the 
solution by single-bit flips until no further improvement is possible. Flips that 
produce better solutions are of higher priority. It was previously shown that 
local search can significantly reduce population sizes for various optimization 
problems, including the spin glass problem [H]. 

2.2 Extremal value distributions 

Several quantities related to the computational complexity studied in this work 
are found to follow extremal value distributions. The central limit theorem for 
extremal values states that the extremes of large samples are distributed ac- 
cording to one of three extremal value distributions, depending on whether their 
shapes are fat-tailed (tails decay polynomially), exponential (tails decay expo- 
nentially), or thin-tailed (tails decay faster than exponentially) ^]. The inte- 
grated probability density function for any of these extremal value distributions 
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can be written as 

ffj;^;^(a;)=exp(^-(l + €^y^, (1) 

where /x is the location parameter, is the scaling parameter, and ^ is the 
shape parameter that indicates how fast the; tail dec;ays. If ^ < 0, iJj;p;/3(a;) 
represents the Frechet distribution (polynomial decay) , if ^ = it represents the 
Gumbel distribution (exponential decay), and if ^ > it represents the WeibuU 
distribution (faster than exponential decay). Distributions encountered in this 
work are Frechet distributions, where the shape parameter ^ determines the 
power law decay of the fat tails of the distribution 

^-^e;/^;/^ ™, x-^^-^IO . (2) 
dx 

From this asymptotic behavior one can see that the m-th moment of a fat 
tailed Frechet distribution (with ^ < 0) is well defined only if |^| < 1/m. 



3 The Ising spin glass 

A 2D spin glass system consists of a regular 2D grid containing N nodes which 
correspond to the spins. The edges in the grid connect nearest neighbors. Ad- 
ditionally, edges between the first and the last element in each dimension are 
added to introduce periodic boundary conditions, for an example 2D spin glass 
structure consisting of 9 spins distributed on a 3 x 3 square lattice. 

With each edge there is a real-valued constant associated which gives the 
strength of spin-spin coupling. For the classical Ising model each spin can be in 
one of two states: +1 or —1. Each possible set of values for all spins is called 
a spin configuration. Given a set of (random) coupling constants, Jjj, and a 
configuration of spins, C, the energy can be computed as 

E{C) = SiJijSj , (3) 

where i,j G {0, 1, ... , A''—!} denote the spins (nodes) and nearest neighbors 
on the underlying grid (allowed edges). The random spin-spin coupling constants 
Jij for a particular spin glass instance are given on input. 

In statistical physics, the usual task is to integrate a known function over all 
possible configurations of spins, where the configurations are distributed accord- 
ing to the Boltzmann distribution. Probability of encountering a configuration, 
C at temperature T is given by 

j:cexp{-E{C)/T) 

From the physics point of view, it is interesting to know the ground states 
(configurations associated with the minimum possible energy). Finding extremal 
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energies then corresponds to sampling the Boltzmann distribution with temper- 
ature approaching and thus the problem of finding ground states is simpler 
a priori than integration over a wide range of temperatures. However, most of 
the conventional methods based on sampling the above Boltzmann distribution 
0]fail to find the ground states configurations because they get often trapped in 
a local minimum. 

The problem of finding ground states is a typical optimization problem, 
where the task is to find an optimal configuration of spins that minimizes en- 
ergy. Although polynomial-time deterministic methods exist for both types of 
2D spin glasses 11 12 , most algorithms based on local search operators, in- 
cluding a (1+1) evolution strategy, conventional Monte Carlo simulations, and 
Monte Carlo simulations with Wang-Landau '^^ or multicanonical sampling [3], 
scale exponentially and are thus impractical for solving this class of problems. 
The origin for this slowdown is due to the suppressed relaxation times in the 
Monte Carlo simulations in the vicinity of the extremal energies because of the 
enormous number of local optima in the energy landscape. Recombination-based 
genetic algorithms succeed if recombination is performed in a way that interact- 
ing spins are located close to each other in the representation; fc-point crossover 
with a rather small k can then be used so that the linkage between contiguous 
blocks of bits is preserved (unlike with uniform crossover, for instance). However, 
the behavior of such specialized representations and variation operators cannot 
be generalized to similar slowly equilibrating problems which exhibit different 
energy landscapes, such as protein folding or polymer dynamics. 

In order to obtain a quantitative understanding of the disorder in a spin glass 
system introduced by the random spin-spin couplings, one generally analyzes a 
large set of random spin glass instances for a given distribution of the spin-spin 
couplings. For each spin glass instance the optimization algorithm is applied and 
results statistically analyzed to obtain a measure of computational complexity. 
Here we first consider two types of initial spin-spin coupling distributions, the 
± J spin glass and the Gaussian spin glass. 

3.1 The ±J spin glass 

For the ± J Ising spin glass, each spin-spin coupling constant is set randomly to 
either +1 or —1 with equal probability (see lower right panel in Figure^. En- 
ergy minimization in this case can be transformed into a constraint satisfaction 
problem, where the constraints relate spins connected by a coupling constant. 
If Jij > 0, then the constraint requires spins i and j to be different, whereas if 
Jij < 0, then the constraint requires spins i and j to be the same. Energy is 
minimized when the number of satisfied constraints is maximized. 

3.2 Gaussian spin glasses 

In the Gaussian spin glass, coupling constants are generated according to a zero- 
mean Gaussian distribution with variance one (see upper left panel in Figure^. 
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Fig. 1. Distribution of coupling constants for the transition from the Gaussian 
(upper left) to the ±J spin glass (lower right). 

For real-valued couplings, energy minimization can be casted as a constraint 
satisfaction problem with weighted constraints. 

3.3 Transition between ibJ and Gaussian spin glasses 

To describe a smooth transition between the ± J and the Gaussian spin glass we 
vary the distribution of spin-spin coupling constants by defining a distribution 
as the sum of two Gaussian distributions, described by means, ±//, and variance, 
a, in such a way that the overall mean becomes fi — and the overall variance 
(7 = 1. The explicit form of the two Gaussians is thus given by — 1~ p,^. The 
±J spin glass {p, — 1) and the Gaussian spin glass (/i = 0) then describe the 
extremal cases of this new family of distributions. The transition between the 
two extrema is then described by varying jl between and 1 which is illustrated 
in Figure lUfor /t = 0, 0.60, 0.80, 0.95, 0.99, 1. 

4 Numerical experiments 

In the following we describe the numerical experiments in more detail and present 
results for the spin glasses described above. 

4.1 Description of experiments 

For ±J and Gaussian 2D spin glasses, systems with equal number of spins in 
each dimension were used of size from n = 8 x 8 to n = 20 x 20. For each 
system size, 1000 random samples were generated. hBOA with the deterministic 
local searcher was then applied to find the ground state for each sample. For the 
transition from ± J to Gaussian spin glasses, we focused on a single system size, 
n = 10 X 10. 
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Gaussian, 14x14 
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Fig. 2. Distribution of Eg for ±J spin glass systems of varying size. Eg and the 
density function arc normalized using fi and f3. 



For each spin glass sample, the population size in hBOA is set to the mini- 
mum population size required to find the optimum in 10 independent runs. The 
minimum population size is determined using bisection. The width of the final 
interval in bisection is at most 10% of its higher limit. Binary tournament selec- 
tion without replacement is used. The windows size in RTR is set to the number 
of spins of the system under consideration, but it is always at most equal to 5% 
of the population size. The 5% cap on the window size is important to ensure 
fast convergence with even small populations. The cap explains the difference 
between the results presented here and the previous results, because populations 
are usually very small for hBOA with local search on Ising spin glasses [H]. 

Performance of hBOA was measured by (1) Eg, the total number of spin 
glass system configurations examined by hBOA (the number of restarts of the 
local searcher), and (2) El, the total number of steps of the local hill climber. 
Due to the lack of space, we only analyze Eg- El was greater than Eg by a factor 
of approximately 0{y/n). Clearly, we can expect that Eg < El- Nonetheless, it 
is computationally much less expensive to perform a local step in the hill climber 
than to evaluate a new spin glass configuration sampled by hBOA. 



4.2 Results for ibJ and Gaussian couplings 

The first important observation is that the distributions of Eg and El for all 
problem sizes and distributions of coupling constants follow Frechet extremal 
value distributions. Applying a maximum likelihood estimator we can determine 
the parameters /x, f3, and ^ of these distributions defined in Equation (1). Figure[3 
shows the histograms and the corresponding probability density function for Eg 
for ± J spin glasses of various sizes. 
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Fig. 3. Location /i and shape ^ for ± J and Gaussian spin glasses using maximum 
likelihood estimation. Standard error of the estimations displayed with error 
bars. 

The location parameter /i indicating the most likely value of Eq can be 
used to determine the scalability of hBOA. In Figures |2t and Ob the location 
parameter for both ± J and Gaussian spin glasses are shown versus the system 
size. Double logarithmic plots confirm that the location has an upper polynomial 
bound. For the ±J spin glass, the order of that polynomial approaches 1.5 as 
system size n grows, whereas for Gaussian couplings, the order of the polynomial 
seems to approach 2.2. 

Figure Et and Otl show the shape ^ for both ±J and Gaussian spin glasses 
with respect to the system size. Since it is always smaller than 1, we conclude 
that the mean is well-defined for all cases. For the variance (2nd moment) we 
find the shape parameter to be smaller than 1/2 only for systems larger than 
n = 10 X 10. Thus, for system smaller than ri = 10 x 10 the variance is not 
well-defined and the mean has an infinite error. 

4.3 Results for the transition between ibJ and Gaussian couplings 

For the transition between ± J and Gaussian couplings, Eq and E^ also follow 
Frechet distributions. Figure 01 shows the distribution of Eq in the transition, 
including ± J and Gaussian cases. Figure |S1 shows location and shape parameters 
for the transition. 

We can see that both location and shape parameters for the transition be- 
tween ±J and Gaussian couplings lie between the corresponding parameters 
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Fig. 4. Distribution of Eg for the transition from ± J to Gaussian spin glasses 
for 71 = 10 X 10. 



for the two extreme cases. That means that considering the two extreme cases 
provides insight not only in the cases themselves, but it can be used to guide 
estimation of parameters for a large class of other distributions of couplings. 



5 Discussion 



In the following we discuss the experimental results, first in the context of hBOA 
scalability theory and then in comparison with flat-histogram Monte Carlo re- 
sults ■ We close by presenting some general conclusions for genetic and evo- 
lutionary computation. 



5.1 Experimental results and hBOA theory 

An interesting question is whether the results obtained can be explained using 
hBOA convergence theory designed for a rather idealized situation, where the 
problem can be decomposed into subproblems of bounded order over multiple 
levels of difficulty. For random 2D Ising spin glasses, it can be shown that for a 
complete single-level decomposition it would be necessary to consider subprob- 
lems of order proportional to \/n as hypothesized by Miihlenbein |14| . which 
would lead to exponentially sized populations ^KJ- Despite this, the number of 
function evaluations grows as a low-order polynomial of the number n of spins 
as predicted by hBOA scalability theory for decomposable problems of bounded 
difficulty |15| . Spin glasses with ±J couplings correspond to uniform scaling, 
where the theory predicts 0(n^-^^) evaluations; indeed the location parameter /x 
indeed seems to approach a polynomial of order approx. 1.5. Spin glasses with 
Gaussian couplings exhibit a non-uniform scaling, where exponential scaling can 
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(a) Location (b) Shape 

Fig. 5. Location /i and shape ^ for the transition between ± J and Gaussian 
spin glasses. Standard error of the estimations displayed with error bars. X-axis 
denotes the distance fl of the means used to generate couplings. 

be taken as a bounding case. For exponential scaling, the number of evaluations 
would be predicted to grow as O(n^); here the location parameter seems to grow 
slightly faster with a polynomial of order approx. 2.2. However, the order of this 
polynomial decreases with problem size. 

5.2 Comparison to flat-histogram Monte Carlo 

Monte Carlo (MC) methods are usually used to integrate a function f{x) with 
some probability density distribution over the input parameter x. The common 
approach is to sample a series of values of x according to the specified probability 
distribution, and averaging the values of f{x). 

While conventional MC has been successfully used in numerous applications, 
it sometimes produces inferior results for low temperatures because the random 
walk through the space of all possible configurations (values of x) of the system 
has difficulties in overcoming energy barriers. One of the ways to alleviate this 
difficulty is to modify the simulated statistical-mechanical ensemble and use 
Wang-Landau sampling |2] to sample each energy level equally likely, thereby 
producing a flat histogram. The Wang-Landau algorithm thus represents a class 
of methods also known as flat-histogram MC. This approach not only alleviates 
the problem of energy barriers, but it also enables computation of the number 
of configurations at different energy levels, which can in turn be used to quickly 
compute thermal averages for any given temperature without having to rerun 
the simulation. 

For flat-histogram MC, the distribution of round-trip times in energy mea- 
sured by the total number of applications of local operators was recently shown 
to follow Frechet distributions However, the absolute value of the shape pa- 
rameter for flat-histogram MC was shown to approach 1. As a result, the mean 
of this distribution is not defined. Further, the location parameter found for flat- 
histogram MC grows exponentially ^Sjj although for this class of spin glasses 
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it is possible to analytically compute the entire energy spectrum in polynomial 
time, 0{n^-^) 11 . 

5.3 Important lessons for genetic and evolutionary computation 

The results presented in this paper indicate that it can be misleading to esti- 
mate the mean convergence time by an average over several independent sam- 
ples (runs), because in some cases the mean, variance, and other moments of 
the respective distribution may become ill-defined. In this work, the location pa- 
rameter serves as a well-defined quantity to express computational complexity 
of various optimization and simulation techniques, including hBOA and flat- 
histogram MC. It can be expected that similar distribution will be observed 
for other evolutionary algorithms, as they reflect intrinsic properties of the spin 
glass 23 ■ 

Random 2D Ising spin glasses represent interesting classes of constraint sat- 
isfaction problems with a large number of local optima. The results presented in 
this work indicate that for such classes of problems, recombination-based search 
can provide optimal solutions in low-order polynomial time, whereas mutation- 
based methods scale exponentially. However, local search is still beneficial for 
local improvement of solutions in recombination-based evolutionary algorithms, 
because incorporating local search decreases population sizing requirements. A 
similar observation was found for MAXS AT j2j . 

6 Conclusions 

Random classes of Ising spin glass systems represent an interesting class of con- 
straint satisfaction problems for black-box optimization. Similar to flat- histogram 
MC, computational complexity of hBOA — expressed in the number of solutions 
explored by both hBOA and the local hill climber until the optimum — is found 
to show large sample-to-sample variations. The obtained distribution of opti- 
mization steps follow a fat-tailed Frechet extremal value distribution. However, 
for hBOA the shape parameter defining the decay of the tail is small enough for 
the first two moments of the observed distributions to exist for all but small- 
est system sizes. The location parameter as well as the mean of this distribution 
scale like a polynomial of low order. The experiments show that similar behavior 
can be observed for ± J and Gaussian spin glasses, as well as for the transition 
between these two cases. For ± J spin glasses, performance of hBOA agrees with 
scalability theory for hBOA on uniformly scaled problems, whereas for Gaussian 
spin glasses, performance of hBOA agrees with scalability theory for hBOA on 
exponentially scaled problems. 

There are some general conclusions for genetic and evolutionary computa- 
tion. First, measuring time complexity by the average number of function evalu- 
ations until the optimum is found can sometimes be misleading when rare events 
dominate the sample-to-sample variations. Second, it was shown for this specific 
problem that recombination-based search can elficiently deal with exponentially 
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many local optima and still find the global optimum in low-order polynomial 
time. 
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